Reading Material

Please read the following material before class starts.

  1. An Introduction to R

  2. R for Data Science


Pretest

Based on the reading material above, please answer the following questions and collect the answers via Google Classroom.

  1. How many data types and structures are there in the R programming language?

  2. State the type of control flow?

  3. What do the R Packages mean?


Introduction to R Programming

R is a popular programming language for statistics and data analytics. In this module we will learn the basics of R programming for data analytics.

Mathematical Operation

Like other programming languages in general, R can perform mathematical operations such as: addition, subtraction, multiplication, and etc.

# Addition
5 + 5
# Subtraction
5 - 5
# Multiplication
3 * 5
# Division
5 / 2
# Power
2^5
# Modulo
28 %% 6

Storing value in Variable

R allows us to store values as variables. To do this, we can use the command <- or =

# Example of store value in variable with "<-"
example_variable <- 4

The value stored in a variable is not immediately displayed. We have to use the command print(varible_name) to display it.

# View the value in a variable with print()
print(example_variable)
## [1] 4

The following is an example of storing values in a varible using `=

# Example of store value in variable with "="
example_variable_2 <- 2

# Print Variable
print(example_variable_2)
## [1] 2

Both commands are working fine. However, in the R community it is more common to use <-

Data Type

R has 5 main data types namely:

Numeric
# Examples of numeric data type
n <- 1.2

To see the data type we can use class(variable_name) function

# View data types of n
class(n)
## [1] "numeric"

To see the value of the data we can use print(variable_name) function, as we have learned before

# View data values
print(n)
## [1] 1.2
Integer
# Examples of integer data type
i <- 2L
Character
# Examples of character data type
c <- "MBA ITB"
Logical
# Examples of logical data type
l <- TRUE
Complex
# Examples of complex data type
com <- 3 + 2i

Data Structure

R has 5 main data sctructure namely:

Vector
# Example of Vector
v <- c("banana", "apple", "tomato")
List
# Example of Vector
li <- list(1, "a", TRUE, 1 + 4i)
Matrix
# Example of Matrix
m <- matrix(c(1:3))
Factor
# Example of Factor
f <- factor(c(
  "Male",
  "Female",
  "Female",
  "Male",
  "Male",
  "Female"
))
Dataframe
df_student <- data.frame(
  name = c("Dito", "Dian", "Rosidi"),
  age = c(22, 23, 27),
  sex = c("Male", "Female", "Male")
)

Function

A function is a set of command lines or code arranged together to perform a specific task. In R, we can use built-in functions or create new ones.

Built-in / Base Functions
# Creating Age Vectors
age <- c(22, 23, 21, 8, 10, 14, 15)
# Calculates the average age
mean(age)
## [1] 16.14286
Custom Function
# Create a function to calculate Return of Investment
roi <- function(profit, cost_of_investment) {
  return((profit / cost_of_investment) * 0.1)
}
# Using the ROI function that has been created
roi(profit = 175, cost_of_investment = 75)
## [1] 0.2333333

Package

R Package is a collection of functions / functions or data in R. There are many packages created / developed by other users. To be able to use the existing functions in a package, we must install and load the package first. The following are examples of packages that are commonly used.

Readxl: Import Excel Data

The readxl package is useful for importing data in .xlsx or .xls format into the R environment.

# Install package
install.packages("readxl")
# Load package
library(readxl)
# Import .xlsx data
df_xlsx <- read_xls("data/Superstore.xls")
# View Data
df_xlsx
Readr: Import CSV Data

The readr package is useful for importing data in .csv format or other text formats into the R environment.

# Install package
install.packages("readr")
# Load package
library(readr)
# Import .csv data
df_csv <- read_csv("data/Superstore.csv")
# View Data
df_csv
Dplyr: Data Transformation

Dplyr is an R package that is widely used for data transformation and exploration.

# Install package
install.packages("dplyr")
# Load package
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.4
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

The following are some examples of common functions from the dplyr package.

Select

The first function is select() which is used to select a particular column.

# Select variabel "order_date" dan "customer_name"
df_select <- select(df_csv, c("order_date", "customer_name"))

# Show the first 6 row
head(df_select)
Filter

The next function is the filter() function which functions to filter data based on a condition.

# Filter segment = Corporate
df_filter <- filter(df_csv, segment == "Corporate")

# Show the first 6 row
head(df_filter)
Mutate

The mutate() function can be used to create new variables.

# Create a new variable using the mutate()
df_mutate <- mutate(df_csv, cost = sales - profit)

# Show the first 6 row
head(df_mutate)
summarise

The summarise() function is used to summarize several data values ​​into a single value. This function will be very useful when combined with other functions in dplyr. The summarise functions that can be used include mean(), median(), sd(), min(), max(), quantile(), first(), last (). For example, we will calculate the average sales.

# Calculate the average sales
df_summarise <- summarise(df_csv, mean(sales))

# Show calculation results
print(df_summarise)
## # A tibble: 1 x 1
##   `mean(sales)`
##           <dbl>
## 1          230.
Arrange

The arrange() function can be used to sort values by a column.

# Sort data by "profit"
df_arrange <- arrange(df_csv, profit)

# Show the first 6 row
head(df_arrange)
Pipe

The pipe function denoted by the notation %>% is used to create a continuous function.

# Finding the average sales rate for the corporate segment
df_pipe <- df_csv %>%
  filter(segment == "Corporate") %>%
  summarise(mean(sales))

# Show calculation results
print(df_pipe)
## # A tibble: 1 x 1
##   `mean(sales)`
##           <dbl>
## 1          234.

Post Test

  1. Create a dataframe containing information from 10 of your classmates (Name, Age, Gender, Hobbies)

  2. Create a function to calculate the Return on Equity

  3. Find out the 5 R packages and explain their functions/purpose